Skip to content

mem: Add faster tiered buffer pool#8775

Merged
arjan-bal merged 8 commits intogrpc:masterfrom
arjan-bal:effecient-tiered-buffer-pool
Feb 16, 2026
Merged

mem: Add faster tiered buffer pool#8775
arjan-bal merged 8 commits intogrpc:masterfrom
arjan-bal:effecient-tiered-buffer-pool

Conversation

@arjan-bal
Copy link
Contributor

@arjan-bal arjan-bal commented Dec 17, 2025

This change adds a new tiered buffer pool that uses power-of-2 tier sizes. It reduces the lookup time for the relevant sizedBufferPool from $O(\log n)$ to $O(1)$, where n is the number of tiers. This creates constant-time lookups independent of the tier count, allowing users to add more tiers without performance overhead.

Benchmarks

Micro-benchmark that measures only the pool query performance, ignoring the allocation time:

func BenchmarkSearch(b *testing.B) {
	defaultBufferPoolSizes := make([]int, len(defaultBufferPoolSizeExponents))
	for i, exp := range defaultBufferPoolSizeExponents {
		defaultBufferPoolSizes[i] = 1 << exp
	}
	b.Run("pool=Tiered", func(b *testing.B) {
		p := NewTieredBufferPool(defaultBufferPoolSizes...).(*tieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				// One for get, one for put.
				_ = p.getPool(size)
				_ = p.getPool(size)
			}
		}
	})

	b.Run("pool=BinaryTiered", func(b *testing.B) {
		p := NewBinaryTieredBufferPool(defaultBufferPoolSizeExponents...).(*binaryTieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				_ = p.poolForGet(size)
				_ = p.poolForPut(size)
			}
		}
	})
}

With 5 tiers:

go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.353m ± 2%   2.036m ± 0%  -61.97% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

With 9 tiers:

go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.659m ± 0%   2.035m ± 0%  -64.04% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

RELEASE NOTES:

  • mem: Add faster tiered buffer pool. Use NewBinaryTieredBufferPool to create such pools.

@arjan-bal arjan-bal added this to the 1.79 Release milestone Dec 17, 2025
@arjan-bal arjan-bal added Type: Performance Performance improvements (CPU, network, memory, etc) Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. labels Dec 17, 2025
@arjan-bal arjan-bal requested review from dfawley and easwars December 17, 2025 10:25
@codecov
Copy link

codecov bot commented Dec 17, 2025

Codecov Report

❌ Patch coverage is 73.33333% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.66%. Comparing base (81a00ce) to head (3b88c26).
⚠️ Report is 83 commits behind head on master.

Files with missing lines Patch % Lines
mem/buffer_pool.go 73.33% 10 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8775      +/-   ##
==========================================
- Coverage   83.22%   80.66%   -2.56%     
==========================================
  Files         418      416       -2     
  Lines       32385    33495    +1110     
==========================================
+ Hits        26952    27019      +67     
- Misses       4050     4663     +613     
- Partials     1383     1813     +430     
Files with missing lines Coverage Δ
mem/buffer_pool.go 81.66% <73.33%> (-14.95%) ⬇️

... and 85 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from bdccc9b to 1f26e66 Compare December 17, 2025 12:28
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we decided to make it possible for the user to set the default buffer pool for the whole process through an experimental API, and get rid of the existing dial option and the server option. Was your plan to do that in a follow-up?

@easwars easwars assigned arjan-bal and unassigned easwars and dfawley Dec 17, 2025
@arjan-bal
Copy link
Contributor Author

I thought we decided to make it possible for the user to set the default buffer pool for the whole process through an experimental API, and get rid of the existing dial option and the server option. Was your plan to do that in a follow-up?

I was waiting for the author of #8770 to raise a PR for exposing a function to set the default buffer pool. See the second part of #8770 (comment).

get rid of the existing dial option and the server option

I'm not sure if we want to do this. Maybe people want to use different buffer pools for each channel.


In this PR, I'm just improving the existing buffer pool implementation.

@arjan-bal arjan-bal assigned easwars and unassigned arjan-bal Dec 18, 2025
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from b50277e to 8e7ecc4 Compare December 18, 2025 19:51
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 8e7ecc4 to 36d34f7 Compare December 18, 2025 19:54
@easwars
Copy link
Contributor

easwars commented Jan 20, 2026

Also, would it make sense to add some of the micro benchmarks that you used as part of this PR? Thanks.

@easwars easwars assigned arjan-bal and unassigned easwars Jan 20, 2026
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 3f63cea to e467b97 Compare February 3, 2026 11:15
@arjan-bal arjan-bal assigned easwars and unassigned arjan-bal Feb 3, 2026
@arjan-bal
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new binaryTieredBufferPool which provides a significant performance improvement for buffer pool lookups by using power-of-2 tier sizes for O(1) lookups. The implementation is clever, leveraging math/bits for efficient calculations. The accompanying tests are thorough, including architecture-specific checks and benchmarks that demonstrate the performance gains. I've found a minor issue in one of the new benchmark tests that could lead to a panic. Overall, this is an excellent contribution that improves performance and is well-implemented.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@arjan-bal arjan-bal force-pushed the effecient-tiered-buffer-pool branch from 1502569 to f32f229 Compare February 3, 2026 18:18
Copy link
Contributor

@easwars easwars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo minor nits

Comment on lines +47 to +58
var defaultBufferPoolSizeExponents = []uint8{
8,
goPageSizeExponent,
14, // 16KB (max HTTP/2 frame size used by gRPC)
15, // 32KB (default buffer size for io.Copy)
20, // 1MB
}

var defaultBufferPool BufferPool
var (
defaultBufferPool BufferPool
uintSize = bits.UintSize // use a variable for mocking during tests.
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Group all these vars in one block, or don't have a block at all (like you currently do for consts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grouped all the vars and consts in a single block.

// Allocating slices of size > 2^maxExponent isn't possible on
// maxExponent-bit machines.
if int(exp) > maxExponent {
return nil, fmt.Errorf("allocating slice of size 2^%d is not possible", exp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: prefix the error string with package name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the prefix.

}

func (b *binaryTieredBufferPool) Put(buf *[]byte) {
b.poolForPut(cap(*buf)).Put(buf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment here capturing the subtle but important fact that we are passing the capacity of the buffer, and not the size of the buffer to poolForPut. If we did the latter, all buffers would eventually move to the smallest pool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

wantErr bool
}{
{
name: "32-bit valid exponent",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Make subtest names more like identifiers. go/go-style/decisions#subtest-names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

t.Errorf("NewBinaryTieredBufferPool() error = %t, wantErr %t", err, tt.wantErr)
return
}
if err == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: invert the conditional and return early for tests where an error was expected and was seen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for b.Loop() {
for size := range 1 << 19 {
_ = p.poolForGet(size)
_ = p.poolForPut(size)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it matter that we are not passing capacity here to poolForPut?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is true that this benchmark doesn't use the buffer's expected capacity, but the results should still be similar. The benchmark intentionally avoids fetching a buffer to ensure we measure only the buffer overhead, excluding allocation time. While we could determine the expected capacity by type-asserting the pool to sizedBufferPool and reading its defaultSize field, this would make the benchmark brittle by relying on a private implementation. Therefore, I am not making the change at this time.

@easwars easwars assigned arjan-bal and unassigned easwars Feb 13, 2026
@arjan-bal
Copy link
Contributor Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new binaryTieredBufferPool which provides O(1) lookup time for buffer pools by using power-of-2 tier sizes. The implementation is well-structured, includes comprehensive tests, and the benchmarks demonstrate a significant performance improvement over the existing tieredBufferPool. I have one suggestion to handle duplicate exponents in the pool configuration to prevent potential resource leaks.

arjan-bal and others added 2 commits February 16, 2026 11:34
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@arjan-bal arjan-bal merged commit 3be7e2d into grpc:master Feb 16, 2026
14 checks passed
@arjan-bal arjan-bal deleted the effecient-tiered-buffer-pool branch February 16, 2026 06:25
Pranjali-2501 pushed a commit to Pranjali-2501/grpc-go that referenced this pull request Feb 23, 2026
This change adds a new tiered buffer pool that uses power-of-2 tier
sizes. It reduces the lookup time for the relevant sizedBufferPool from
$O(\log n)$ to $O(1)$, where n is the number of tiers. This creates
constant-time lookups independent of the tier count, allowing users to
add more tiers without performance overhead.

## Benchmarks

Micro-benchmark that measures only the pool query performance, ignoring
the allocation time:
```go
func BenchmarkSearch(b *testing.B) {
	defaultBufferPoolSizes := make([]int, len(defaultBufferPoolSizeExponents))
	for i, exp := range defaultBufferPoolSizeExponents {
		defaultBufferPoolSizes[i] = 1 << exp
	}
	b.Run("pool=Tiered", func(b *testing.B) {
		p := NewTieredBufferPool(defaultBufferPoolSizes...).(*tieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				// One for get, one for put.
				_ = p.getPool(size)
				_ = p.getPool(size)
			}
		}
	})

	b.Run("pool=BinaryTiered", func(b *testing.B) {
		p := NewBinaryTieredBufferPool(defaultBufferPoolSizeExponents...).(*binaryTieredBufferPool)
		for b.Loop() {
			for size := range 1 << 19 {
				_ = p.poolForGet(size)
				_ = p.poolForPut(size)
			}
		}
	})
}
```
With 5 tiers:
```sh
go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.353m ± 2%   2.036m ± 0%  -61.97% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```
With 9 tiers:
```sh
go test -bench=BenchmarkSearch -count=10 -benchmem | benchstat -col '/pool' -
goos: linux
goarch: amd64
pkg: google.golang.org/grpc/mem
cpu: Intel(R) Xeon(R) CPU @ 2.60GHz
          │   Tiered    │            BinaryTiered             │
          │   sec/op    │   sec/op     vs base                │
Search-48   5.659m ± 0%   2.035m ± 0%  -64.04% (p=0.000 n=10)

          │   Tiered   │          BinaryTiered          │
          │    B/op    │    B/op     vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal

          │   Tiered   │          BinaryTiered          │
          │ allocs/op  │ allocs/op   vs base            │
Search-48   0.000 ± 0%   0.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

RELEASE NOTES:
* mem: Add faster tiered buffer pool. Use `NewBinaryTieredBufferPool` to
create such pools.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ash2k
Copy link
Contributor

ash2k commented Mar 7, 2026

I'm not sure if we want to do this. Maybe people want to use different buffer pools for each channel.

Very much agree with this. As a user, I'd prefer to not have any global state at all. I'd prefer all things to be tied to a client or to a server. I can share what I want to share across instances of these by myself. This is a more flexible approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: Transport Includes HTTP/2 client/server and HTTP server handler transports and advanced transport features. Type: Performance Performance improvements (CPU, network, memory, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants